Handbook of Research on Big Data and the IoT by Kaur Gurjit
Author:Kaur Gurjit
Language: eng
Format: epub
Publisher: Engineering Science Reference
4.1.3 Data Storage
HDFS (Hadoop distributed file system), S3 (Simple storage services)
Servers: EC2, Google App Engine, Elastic, Beanstalk, Heroku
4.1.4 Data Processing
R, Yahoo! Pipes, Mechanical Turk, Solr/Lucene, ElasticSearch, Datameer, BigSheets, Tinkerpop
We now examine two of the most popular Big Data processing frameworks, MapReduce and Hadoop, in detail.
4.2. MapReduce
It is a data processing computational framework applied to large datasets by employing distributed algorithms on clusters. This framework comprises user-defined Map and Reduce functions as well as a MapReduce library. Data is processed in parallel using map functions, whose output is sorted and processed by reducing functions. The MapReduce library parallelizes the data processing by breaking it down into smaller chunks that are processed using a master/slave implementation. Typically, the MapReduce framework is implemented in six steps as follows.
Step 1: Read data value from the Hadoop Distributed File Systems (HDFS).
Step 2: Split the task into small tasks.
Step 3: Input key/value pairs to Map function to generate intermediate key/value pairs.
Step 4: From the output of the Map function, identify and send all pairs with the same key to the Reduce function.
Step 5: Sort the input to the reduce function by key.
Step 6: Write the reduced output into the HDFS.
Download
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.
Sass and Compass in Action by Wynn Netherland Nathan Weizenbaum Chris Eppstein Brandon Mathis(7810)
Grails in Action by Glen Smith Peter Ledbrook(7719)
Azure Containers Explained by Wesley Haakman & Richard Hooper(6840)
Configuring Windows Server Hybrid Advanced Services Exam Ref AZ-801 by Chris Gill(6839)
Running Windows Containers on AWS by Marcio Morales(6367)
Kotlin in Action by Dmitry Jemerov(5092)
Microsoft 365 Identity and Services Exam Guide MS-100 by Aaron Guilmette(5070)
Combating Crime on the Dark Web by Nearchos Nearchou(4648)
Microsoft Cybersecurity Architect Exam Ref SC-100 by Dwayne Natwick(4616)
Management Strategies for the Cloud Revolution: How Cloud Computing Is Transforming Business and Why You Can't Afford to Be Left Behind by Charles Babcock(4437)
The Ruby Workshop by Akshat Paul Peter Philips Dániel Szabó and Cheyne Wallace(4335)
The Age of Surveillance Capitalism by Shoshana Zuboff(3979)
Python for Security and Networking - Third Edition by José Manuel Ortega(3895)
The Ultimate Docker Container Book by Schenker Gabriel N.;(3555)
Learn Wireshark by Lisa Bock(3531)
Learn Windows PowerShell in a Month of Lunches by Don Jones(3528)
Mastering Python for Networking and Security by José Manuel Ortega(3376)
Mastering Azure Security by Mustafa Toroman and Tom Janetscheck(3356)
Blockchain Basics by Daniel Drescher(3325)
